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EVALUATION 
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This  effort  extended  the  efforts  performed  under  RADC  contract  F30602- 
75-C-0082,  entitled  "Coding  of  Aerial  Reconnaissance  Images  for  Transmission 
Over  Noisy  Channels"  by  an  objective  comparison  of  the  effect  of  source/ 
channel  encoding  Improvements  In  transmitting  digitized,  high  resolution 
Imagery.  The  bases  of  this  comparison  were  mean  squared  error,  mean 
absolute  error,  and,  most  significantly,  the  assessment  of  professional 
photo- interpreter/analysts  (the  ultimate  users  of  such  imagery).  Source 
encoding  Is  intended  to  reduce  non-essential  (to  the  user)  redundancy  in 
original  Imagery,  but  this  Increases  the  sensitivity  of  the  source-encoded 
data  to  communication  channel  noise.  Thus  channel  encoding  (efficient  bit 
apportionment/quantization,  data  formatting,  and  added  redundancy  for  error 
detection  and  correction)  Is  necessary  to  minimize  this  noise  sensitivity. 
The  present  research  effort  which  emphasized  improved  bit  apportionment 
and  quantization  for  realistic  noisy  channel  conditions,  provides  for 
enhanced  quality  in  reducing  the  bandwidth  requirements  for  reconnaissance 
Imagery  conmunlcatlon. 

Project  Engineer 


I.  Introduction 

An  earlier  study  C13  has  shown  the  potential  for  non-entropy-preserving 
coding  of  high  resolution  imagery  such  that  photo-analysts  are  pleased  with 
the  reconstructed  pictures.  This  present  study  has  concentrated  on  the 


source  coding  methods  with  the  most  potential  for  this  application  and  has 
resulted  in  refinement  of  three  methods  (multiclass  zone  block  transform, 
hybrid,  and  block  truncation).  New  subjective  rankings  have  been  performed 
which  show  significant  advances  in  coding  performance. 


II.  Two-Dimensional  Block  Transform  Coding  with  Multiclass  Zones 
Adaptive  two-dimensional  block  transform  coding  methods  such  as 
described  by  Chen  and  Smith  L21  have  produced  the  best  ranked  results.  In 
such  methods,  the  Two-Dimensional  Fast  Discrete  Cosine  Transform  C3,43  is 
performed  over  sub-blocks  of  the  original  image.  The  blocks  are  sorted  into 
classes  based  on  one  or  more  statistical  features.  Then  bits  are  assigned 
to  each  class  based  on  coefficient  variances  within  each  class.  Detailed 
description  of  such  techniques  are  found  in  C13. 

Reported  in  this  section  are  (1)  the  use  of  non-equal  numbers  of  blocks 
in  each  class;  (2)  methods  of  assigning  blocks  to  classes  which  result  in 
improved  mean-square  error  performance  and  higher  subjective  ratings;  (3)  a 
method  of  preprocessing  the  image  to  improve  subjective  ratings  of  the 
reconstructed  pictures;  and  (4)  resolution  and  quantization  considerations. 
II. 1.  Variable  Number  of  Blocks  Per  Class 

A typical  adaptive  multiclass  zone  method  is  diagrammed  in  Fig.  II— 1 . 
In  the  implementation  described  by  Chen  and  Smith  C23,  the  total  ac  energy 
is  used  to  classify  each  block  into  one  of  4 classes,  so  that  each  class  has 
an  equal  number  of  blocks  assigned  to  it.  The  purpose  of  having  an  equal 
number  of  blocks  per  class  is  to  insure  easily  that  the  average  coding  rate 
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over  the  entire  image  is  maintained.  However,  this  appears  to  be  an  un- 
necessary restriction  in  that  the  bit  assignment  algorithm  can  easily  use 
the  number  of  blocks  in  each  class  to  assign  bits  so  that  the  average  coding 
rate  is  maintained. 

The  bit  assignment  algorithm  assigns  bits  simultaneously  to  all  coeffi- 
cients in  all  classes  proportional  to  the  logarithm  of  the  sample  variance 
of  each  coefficient.  The  number  of  bits  assigned  to  the  (u,v)th  coefficient 
belonging  to  class  k is 

N^(u,v)  = ^ log.,  Ca^(u,v)D  - log^  D <11—1 ) 

where  a^(u,v)  is  the  sample  variance  of  the  (u,v)th  coefficient  over  all 
blocks  in  class  k and  D is  a constant  which  sets  the  compression.  This  as- 
signment is  optimal  in  mean-square-error  performance  assuming  Gaussian  dis- 
tributed, uncorrelated  coefficients  C5J. 

The  parameter  D can  be  determined  using  the  following  steps: 

Let  B^c  = number  of  bits  assigned  to  the  dc  coefficient 
in  each  class. 

B = desired  average  coding  rate  in  bits/pixel 
9 (not  including  overhead). 

B^  = total  number  of  bits  assigned  to  the  ac  coefficients 
of  a block  in  class  k. 

= number  of  blocks  assigned  to  the  k1^  class. 

N = total  number  of  pixels  in  a block 

M = total  number  of  pixels  in  a picture. 

Then 

B,  = 4 E log,  oJ(u,v)  - (N-1)  log,  D (II-2) 

K 6 (u ,v>*0 

for  k = 1,2,. ..,K.  Also, 
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UI-3) 


K 

E 

k=1 


C.(B,  + B , ) = M*B 
k k dc  avg 


substituting  EQ.  II— 2 into  EQ.  I I— 3 gives 


log2  D = 2M(Tm7 


E <c,  E log  c^(u,v)>  + 
k K (u,v)*0 


B , -N'B 

dc  avg 

N-1 


(I 1-4) 


Because  of  round-off  error,  the  exact  desired  number  of  assigned  bits 
may  not  be  assigned  on  the  first  trial.  Therefore,  the  value  of  D can  be 
modified  as  follows  where  primes  indicate  the  values  obtained  on  the  first 
trial:  From  Eq.  II-3 


K 

E 


k=1 


C.  (B.-B.)  = M(B 
k k k avg 


B ) 
avg 


CII-5) 


using  Eq.  II— 2 

, N(B  - b’  ) 

log^D  = log^D (II-6) 

Of  course  the  advantage  of  using  a variable  number  of  blocks  in  each 
class  is  that  small  unique  regions  of  a picture  (e.g.,  islands  in  an  ocean) 
can  be  given  their  own  class  and  optimum  bit  assignment. 

II. 2.  Selection  of  Classes 

II. 2.1.  Energy  and  Frequency  Content 

Although  ac  energy  is  often  the  most  useful  parameter  for  dividing 
blocks  into  classes  (low  energy  classes  get  few  bits),  such  an  assignment 
neglects  the  frequency  distribution  of  energy.  For  example,  a group  of 
blocks  may  have  a low  total  energy  but  with  most  of  their  energy  in  the  high 
frequencies.  Other  blocks  with  low  energy  might  be  primarily  low  frequency 
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blocks.  These  two  groups  would  be  put  in  the  same  class  based  on  energy  but 
the  bit  assignment  would  have  to  spread  over  both  the  high  and  low  frequency 
ranges.  However,  if  these  two  groups  could  be  put  into  separate  classes, 
the  bit  assignment  would  be  quite  different,  thus  resulting  in  a more  effi- 
cient code.  Such  a system  has  been  implemented  by  using  the  ac  energy  to 
divide  blocks  into  two  classes  and  then  using  the  ratio  of  low  frequency  en- 
ergy to  high  frequency  energy  to  sub-divide  each  of  the  resulting  two 
classes.  The  sample  mean  was  used  as  the  class  boundary  in  each  case.  This 
system  is  to  be  compared  to  one  which  uses  energy  only  to  divide  into  four 
equal  classes  as  described  in  C23. 

Using  the  original  shown  in  Fig.  V-1,  the  two  methods  discussed  were 
applied  to  obtain  a total  coding  rate  of  1.5  bits/pixel.  Overhead  allowed 
(including  error  protection)  averaged  15.5  bits/block  thus  leaving  an  aver- 
age of  368.5  bits/block  for  the  bit  assignment. 

The  4 equal  classes  based  on  energy  method  resulted  in  the  classifica- 
tion  map  of  Fig.  II-2  and  the  bit  allocation  maps  of  Fig.  II-3.  The  result- 
ing reconstructed  image  is  shown  in  Fig.  II-4.  The  mean  square  error  is 

35.2. 

The  energy  and  ratio  method  with  variable  number  of  blocks  per  class 
resulted  in  the  classification  map  of  Fig.  I I— 5 and  the  bit  allocation  maps 
of  Fig.  II-6.  The  resulting  reconstructed  image  is  shown  in  Fig.  1 1 —7 . The 
mean  square  error  is  28.0. 

11. 2. 2.  Feature  Clustering 

The  performance  improvement  shown  above  has  led  to  the  development  of  a 
more  adaptive  method  of  choosing  the  class  features.  Several  frequency  re- 
gions of  each  transformed  block  are  defined  including  low  frequency,  mid- 
frequency,  high-frequency,  and  horizontal  and  vertical  edges.  Typical  re- 
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Fig. 1 1-2.  Classification  map  for  Flg.v-I  using 
4 equal  size  energy  classes.  Class  1 
represents  16x16  blocks  with  the 
highest  energy  and  Class  k,  the  lowest. 
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Fig.''  3 Bit  allocation  maps  for  the  four  classes  shown 
in  Fig. 1 1-2. Bits  are  assigned  proportional  to 
the  sample  variance  of  each  coefficient  with 
the  average  bit  rate  1.5  blts/plxel  Including 
overhead. 
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F I g J 1-5  Classification  map  for  FlgJZ-1  using  A variable 
size  classes  based  on  energy  and  ratio  of  low 
frequency  to  high  frequency  energy.  Class  1 
represent,  high  energy,  low  frequency;  Class  2 
represents  low  energy,  low  frequency;  Class  3 
represents  high  energy,  high  frequency;  Class  4 
represents  low  energy,  high  frequency. 
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Figure  11-6  Bit  allocation  maps  for  the  four  classes 
shown  In  Fig. 11-5.  Bits  are  assigned  pro- 
portional to  the  sample  variance  of  each 
coefficient  with  the  average  bit  rate  1.5 
btts/plxel  Including  overhead. 
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gions  are  shown  in  Fig.  II-8  for  a 16  x 16  block.  Four  features  are  defined 

LOW 

based  on  the  total  energy  in  each  region.  These  are:  (1)  log  -5=7-  ; (2)  log 

MED 

; (3)  log  : and  (4)  log  (LOW  + MED  + HIGH  + EDGE). 

The  histograms  of  these  four  features  are  collected  over  all  picture 
blocks. 

A clustering  procedure  is  then  used  to  find  the  histogram  which  can  be 
most  obviously  divided  into  two  classes.  After  the  first  division,  all  four 
histogram  for  each  class  are  again  examined  and  each  class  is  again  divided 
into  two  using  the  most  useful  feature  for  each  division. 

Although  this  method  offers  a large  amount  of  adaptability,  the  im- 
provement gained  is  not  significantly  better  than  that  using  ratio  and  ener- 
gy (described  earlier)  for  typical  aerial  reconnaissance  imagery  used  in  the' 
present  study. 

II. 2. 3.  Spatial  Criteria 

Another  of  the  four  class  zonal  coding  techniques  we  have  studied  in- 
volves both  frequency  domain  and  spatial  domain  information.  The  cosine 
transform  of  each  16  x 16  block  of  pixels  is  examined  to  determine  the 
bandwidth  of  information  contained  within  that  block.  In  particular,  the 
a.c.  energy  contained  in  successively  larger  and  larger  circular  frequency 
domain  zones  (centered  at  the  dc  coefficient)  is  calculated  as  a percentage 
of  the  total  a.c.  energy  contained  in  this  block.  The  minimum  radius  of  a 
circular  zone  required  to  contain  at  least  90’/.  of  this  total  ac  energy 
within  the  block  is  calculated.  "Water"  blocks  typically  exhibit  bandwidth 
radii  of  from  four  to  seven.  "City"  blocks,  on  the  other  hand,  take  on 
bandwidth  radii  up  to  and  including  the  maximum  allowed  amount  of  15.  In 
the  spatial  domain,  these  same  circular  zone  radii  are  calculated  but  now  on 
the  basis  of  the  circular  zone  size  required  to  obtain  a maximum  absolute 
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Fig. I 1-8  Regions  in  the  16x16  Transformed  Block  Used  for 
Determining  Classification  of  Each  Block. 


error  upon  inverse  transformation  that  does  not  exceed  a threshold  of  12.5. 


These  two  circular  zone  radii  (based  on  90%  bandwidth  and  12.5  maximum 
absolute  error)  allow  us  to  construct  a two  dimensional  histogram  that  lists 
the  total  number  of  picture  blocks  that  have  any  given  possible  bandwidth 
and  require  any  given  circular  zone  size  to  achieve  a moderately  low  maximum 
absolute  error  in  the  spatial  domain  of  the  decoded  block.  Those  blocks 
that  exhibit  low  bandwidth,  and  also  demonstrate  very  quick  pointwise  con- 
vergence  using  the  cosine  transform,  are  placed  in  what  we  call  zone  1. 
Those  blocks  having  both  a high  bandwidth  and  slow  spatial  domain  conver- 
gence, are  defined  to  be  in  class  4.  The  "class  two"  blocks  are  also  of 
wide  bandwidth,  however,  exhibit  quick  convergence  in  the  spatial  domain. 
Class  three  blocks  are  of  low  bandwidth  yet  demonstrate  very  stubborn  con- 
vergence upon  inverse  transformation.  (One  algorithm  we  have  used  to  estab- 
lish class  boundaries  in  this  two-dimensional  histogram  is  a bi-modal  one 
that  we  discuss  below.  We  have  also  coded  images  using  class  boundaries 
that  forced  all  four  classes  to  be  of  equal  size.  Evaluations  of  images 
coded  using  both  schemes  are  reported  on  below.) 

The  result  is  an  algorithm  that  places  "water"  blocks  in  class  1 and 
"city"  blocks  in  class  4,  respectivey,  very  reliably.  More  importantly, 
however,  we  have  found  that  this  technique  isolates  into  class  3 two  very 
troublesome  types  of  image  blocks;  the  "coastline"  and  the  "boat-in-the- 
water"  blocks.  The  former  occurs  when  a group  of  16  x 16  pixels  overlaps  a 
boundary  between  a region  of  high  ac  energy  and  a region  of  low  ac  energy. 
Since  the  high  energy  pixels  occupy  only  a portion  of  the  total  256  pixel 
region,  it  is  possible  for  other  categorization  schemes  erroneously  to  place 
such  a block  into  a lower  ac  energy  category.  The  result  is  a poor  render- 
ing of  the  "city"  portion  of  the  block.  "Boat-in-the-water"  blocks  exhibit 


one  or  two  pixels  of  highly  isolated  detail  in  a background  field  of  very 
low  bandwidth.  Again  observing  only  the  total  ac  energy  with  the  block  (or 
its  bandwidth)  could  cause  a more  simple-minded  categorization  technique  to 
place  this  block  in,  say,  a "water"  class.  The  result  would  be  a blurring 
or  even  a complete  loss  of  the  small  potential  target.  By  paying  attention 
to  both  frequency  bandwidth  and  pointwise  spatial  domain  covergence,  we  have 
found  that  we  may  very  reliably  flag  both  of  these  block  types. 

Of  course  such  an  approach  does  require  the  repeated  inverse  transfor- 
mation of  larger  and  larger  subsets  of  the  cosine  transform  coefficients  of 
each  block.  However,  by  making  use  of  the  normally  high  correlation  of  the 
spatial  domain  radius  from  one  block  to  its  neighbor,  we  may  cut  this  compu- 
tational labor  to  a minimum.  The  initial  guess  for  the  spatial  domain  ra- 
dius of  each  new  block  is  simply  taken  as  the  resulting  radius  value  calcu- 
lated for  the  previous  block.  Then  adjustments  in  the  zone  size  to  obtain 
the  necessary  point-wise  convergence  are  made  in  an  upward  or  downward 
direction  as  needed.  Typically  only  three  or  four  inverse  transformations 
of  each  block  are  needed  during  categorization. 

Once  the  above-mentioned  two-dimensional  spatial-frequency  domain  zone 
size  histogram  is  formed,  it  must  be  divided  into  four  regions.  This  divi- 
sion is  based  on  a "most  bimodal"  criterion.  If  we  define  each  element  h.. 

i) 

of  the  two-dimensional  histogram  _H  as  the  number  of  blocks  in  the  (i,j)  bin 
of  the  histogram  (where  i corresponds  to  the  spatial  domain  zone  size  and  j 
corresponds  to  the  frequency  domain  zone  size)  we  can  define  two  one- 
dimensional histograms  _x  and  y_  with  elements 
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respectively. 

For  any  one-dimensional  histogram  _a  with  elements  a.  we  can  define  a 
measure  of  bimodality 
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where  i . is  chosen  to  be  equal  to  the  smallest  integer  value  i such  that 
min 

i 16 

E a.  _>  .25  E a,  . 

i=1  J k=1  K 


This  condition  ensures  that  at  least  25%  of  the  total  number  of  blocks  in 
the  histogram  will  be  put  into  one  of  the  two  low  energy  classes  and 
prevents  these  classes  from  receiving  too  few  bits  in  the  bit  assignment  al- 
gorithm due  to  a small  sample  size. 

Using  (II-7),  we  can  define  the  most  bimodal  point  for  a one- 
dimensional histogram  and  its  corresponding  location  as 

b = max  -Cb.}  , i = i . +1  . , 15  (II-8) 

max  . i ' min  ' ' 

i 


and 


i = index  of  b.  at  which  occurred.  (II —9) 

max  l max 

The  actual  location  of  the  most  bimodal  split  will  be  immediately  on  either 
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side  of  i , whichever  division  yields  the  most  nearly  equal  classes, 
ms  x 

, Applying  (II —8 ) and  ( I 1—9 ) to  the  one-dimensional  histograms  _x  and  ^ , 

we  choose  the  most  bimodal  of  these  (the  one  with  the  largest  value  of  b ) 

max 

for  the  primary  division  of  the  two-dimensional  zone  size  histogram.  Two 
new  one-dimensional  histograms  may  now  be  formed,  one  on  each  side  oi  the 
primary  split,  with  the  major  axis  in  both  cases  being  the  criterion  that 
was  not  used  in  the  primary  division  (either  spatial  or  frequency  domain 
zone  size).  These  new  one-dimensional  histograms  may  then  be  divided  at 
their  respective  most  bimodal  points  by  again  applying  (II —8 > and  (II-9)  to 
each. 

A typical  two-dimensional  histogram  for  a 256  x 256  picture  is  shown  in 
Figure  I I— 9 with  its  appropriate  bimodal  divisions. 

1 1.3  Preprocessing  to  Improve  Subj ective  Ratings 

Since  the  ultimate  performance  criterion  for  our  coding  was  human  photo 
analyst  subjective  ratings,  we  found  that  some  modifications  could  improve 
the  subjective  ranking  while  actually  raising  the  mean-square-error  in  the 
resulting  picture.  Such  is  the  case  of  a non-linear  gray  level  transforma- 
tion prior  to  coding  and  the  inverse  process  after  coding.  The  transforma- 
tion used  for  8-bit  original  data  was: 

for  245  < x £ 255  y = x _ , 

23  < x < 245  y = 40xL-H  - 116.17 

0 x < 23  y = x 

Such  a power  law  (X^‘S  has  been  proposed  as  similar  to  the  human  visual 
system  response  C63 . 

The  more  traditional  logarithmic  transformation  was  found  to  be  too 
flat  at  the  high  gray  levels  (the  inverse  after  coding  reconstruction  em- 
phasized the  ringing  due  to  high  frequency  loss).  This  particular  power  law 
was  judged  subjectively  most  pleasing  by  the  authors. 
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Subjective  rankings  showed  that,  on  the  average,  the  preprocessed  im- 
agery was  ranked  better  than  the  non-preprocessed  imagery.  As  an  example 


t 

compare  the  two  reconstructed  images  in  Figs.  V-3  and  V- 4.  Fig.  V-3  uses 
the  4-feature  method  described  in  the  previous  section.  Fig.  V-4  used  the 
same  method  except  preprocessing  precedes  coding  and  the  inverse  operation 
is  applied  following  reconstruction.  The  effect  of  the  preprocessing  is  to 
emphasize  the  energy  in  the  dark  regions  of  the  image  so  that  more  bits  are 
used  to  code  the  darker  regions  than  normal.  Since  the  human  is  more  sensi- 
tive to  the  same  incremental  additive  changes  in  a dark  region  than  in  a 
bright  one,  the  coding  more  clearly  matches  the  human's  response  even  though 
the  mean  square  error  increased  (the  mean  square  error  decreases  in  the  dark 
regions  and  increases  in  the  bright  regions). 

II .4  Resolution  and  Quantization  Considerations 

Most  image  bandwidth  compression  studies  in  the  past  have  aimed  for 
good  aesthetic  reception  (of  a coded-decoded  image)  by  the  human  brain,  and 
have  often  relied  upon  the  existence  of  substantial  correlation  among  neigh- 
boring pixels.  However,  when  aesthetic  appearance  is  made  secondary  to 
sheer  useful  information  content,  and/or  when  each  isolated  pixel  is  poten- 
tially important,  then  a number  of  judgements  and  techniques  in  image  coding 
are  required  to  change,  ke  will  use  this  section  to  examine  some  of  these 
new  considerations  at  bit  rates  in  the  0. 5-2.0  bits/pixel  range.  All  of  our 
example  blocks  (of  size  16x16  pixels)  will  be  drawn  from  the  "AP2"  original 
shown  in  Fig.  V-1 . Over  several  regions  (especially  the  city  areas  at  the 
upper  left)  the  image  bandwidth  here  approaches  the  spatial  Nyquist. 


Our  work  for  the  Rome  Air  Development  Center  has  required  image 
compressions  down  to  0.5  bits/pixel.  Since  most  (but  not  all)  spatial 
domain  coding  methods  are  limited  to  1 bit/pixel  and  above,  a substantial 


amount  of  our  effort  has  been  applied  to  frequency  domain  coders.  These 
transform  coding  techniques  are  easily  capable  of  image  compression  rates 
down  to  0.5  bits/pixel  and  below.  Thus  most  of  the  discussion  of  this  sec- 
tion will  be  in  reference  to  zonal  transform  coding  of,  for  simplicity,  the 
single  class  type.  Here  the  image  is  artificially  broken  into  sub-images  of 
size  16x16  called  "blocks". 

A particularly  troublesome  problem  for  frequency  domain  coders  is 
presented  by  image  regions  of  high  detail  that  are  of  very  limited  spatial 
extent.  In  Fig.  V-1  these  could  be  boats  of  one  or  two  pixel  width  appear- 
ing on  a background  of  water.  Alternately  a narrow  strip  of  coastline  may 
turn  out  to  occupy  only  an  edge  of  an  otherwise  low  frequency  block.  In  any 
case,  and  depending  upon  the  intricacies  of  the  coding  method  used,  render- 
ing such  isolated  details  without  uniformly  increasing  the  transmission  bit 
rate  presents  a problem  that,  while  aesthetically  negligible,  may  well  be  of 
real  concern  for  reconnaissance  purposes. 

One  means  of  handling  such  special  cases,  without  uniformly  easing  up 
on  the  compression  rate,  is  to  provide  an  increased  number  of  bits  for  these 
(hopefully  few)  blocks  of  isolated  detail  on  an  adaptive  basis.  This  natur- 
ally implies  that  schemes  are  available  to  detect  these  special  situations. 
One  detection  approach  that  seems  particularly  attractive  is  to  total  the 
percent  a.c.  energy  within  a block,  working  from  the  lower  toward  the 
higher  spatial  frequency  coefficients.  (These  sums-of-squares  of  transform 
coefficients  would  be  calculated  by  the  coder  during  its  evaluation  of  coef- 
ficient variances  for  bit  assignment.  Thus  the  a.c.  energy  calculations 
will  require  further  arithmetic  additions  but  no  new  multiplications.)  If  a 
noticeably  large  number  of  squared  coefficients  are  required  to  total  to, 
say,  90%  of  the  full  block  transform  a.c.  energy,  then  additional  coding 
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bits  for  this  block  are  apparently  called  for.  Such  an  approach  will  clear- 
ly flag  any  image  block  containing  a large  amount  of  high  frequency  energy, 
such  as  a "city"  region  mentioned  above.  Since  these  blocks  are  obviously 
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prime  candidates  for  receiving  extra  coding  bits  — to  better  represent 
their  more  important  higher  frequency  coefficients  — this  "energy  thres- 
hold" approach  looks  promising. 

For  "city"  image  blocks  widely  dominated  by  regions  of  detail,  this  en- 
ergy threshold  detection  method  performs  well.  Fig.  11-10  indicates  that  it 
is  less  than  satisfactory  for  blocks  of  isolated  detail,  however.  Here  we 
have  extracted  a single  "boat  in  the  water"  block  from  Fig.  V-1 . In  the 
original  image  the  boat  is  represented  by  only  one  pixel  while  the  water 
presents  a background  of  slowly  changing  intensity.  The  upper  left  quarter 
of  Fig.  11-10  represents  the  recovered  image  block  where  90%  of  the  total 
a.c.  transform  energy  is  used  for  the  inverse  transformation.  The  one-pixel 
boat  is  no  where  in  evidence.  Even  at  95%  energy  threshold,  the  boat  is 
<nly  marginally  distinguishable  and  thus  vulnerable  to  obliteration  by 
transform  coefficient  quantization  errors.  At  98%  (lower  left)  the  boat  has 
become  quite  distinct,  but  at  such  a high  threshold  most  other  blocks  within 
the  scene  were  found  to  have  their  bit  rates  set  at  a much  higher  level  then 
their  information  content  could  justify.  The  result  is  an  unacceptable  loss 
in  the  overall  compression  rate. 

To  understand  the  isolated  pixel  problem  in  the  context  of  the  two  di- 
mensional Cosine  Transform,  we  refer  to  Figs.  11-11  and  11-12.  We  apply  the 
Cosine  Transform  to  an  image  test  block  consisting  of  a single  unit  intensi- 
ty pixel  placed  in  the  middle  of  a field  of  zero  (dark)  pixels.  Scanning 
down  the  main  diagonal  of  the  resulting  transform  array,  we  encounter  the 
coefficient  values  plotted  in  Fig.  11-11.  While  the  two  dimensional  Fourier 
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inrj  of  a single  pixel  target  using  f)0! 
a . r. . energy  within  a transform  bloc! 


16x16  Cosine  Transform  matrix 
of  the  isolated  pixel  block  is 
used  for  the  inverse  transformation. 


Transform  can  be  shown  to  produce  a transform  array  of  equal  coefficient 
magnitudes,  this  is  definitely  not  true  of  the  Cosine  Transform.  While  the 
transform  energy  is  widely  distributed,  a few  coefficients  take  on  values 
several  times  those  of  others:  particularly  the  highest  spatial  frequency  at 
location  (15,15)  in  the  transform  matrix.  Fig.  11-11  shows  this  and  other 
higher  frequency  coefficients  to  be  especially  important.  The  Fourier 
transform  could  be  expected  to  perform  much  better  on  this  very  special  type 
of  block. 

In  the  spatial  domain  this  very  slow  convergence  of  the  Cosine 
Transform  to  a "boat  in  the  water"  is  depicted  by  Figure  11-12.  Here  in- 
verse transforms  are  performed  on  submatrices  of  the  full  transform  matrix 
of  size  1x1,  2x2,  3x3,  etc.  until  the  full  matrix  is  a used  (16x16).  Even 
at  15x15,  when  88%  of  the  coefficients  are  in  use,  the  maximum  absolute  er- 
ror exceeds  13%.  Further  experimentation  with  this  threshold  criterion 
shows  that  if  the  "boat"  size  is  increased  to  2x2  pixels,  a 90%  threshold 
detects  these  blocks  with  only  a small  loss  in  the  overall  compression  rate. 

While  "city"  blocks  should  clearly  be  provided  additional  coding  bits 
in  most  adaptive  schemes,  blocks  of  a "coastline"  or  "boat"  nature 
(described  above)  might  best  be  represented  through  some  other  form  of  spe- 
cial handling,  such  as  resorting  to  a spatial  domain  coding  technique  for 
these  unique  cases.  Thus  a means  is  still  required  to  single  them  out.  We 
have  found  the  following  criteria  to  be  quite  effective:  all  image  blocks 
requiring  a relatively  small  number  of  transform  coefficients  to  account 
for,  say,  90%  of  the  total  a.c.  energy  are  found.  Among  these,  several  in- 
verse transforms  are  performed  on  each  transform  array  to  determine  those 
blocks  needing  a larger  number  of  represented  coefficients  to  achieve  satis- 
factory pointwise  convergence.  (We  use  a maximum  absolute  error  of  12.5  for 
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pixels  in  the  range  0-255.)  The  first  measure  (a.c.  energy)  is  basically  a 
mean-square  error  one,  while  the  second  is  a spatial  domain  maximum  absolute 
error  measure.  Those  blocks  exhibiting  good  energy  compaction  (from  a pure 
percentage  viewpoint),  while  still  showing  poor  spatial  domain  convergence 
are  placed  in  the  "boat-coastline"  class.  This  has  been  found  to  work  very 
well.  "City"  blocks  can  also  be  flagged  reliably.  These  too  exhibit  slow 
pointwise  convergence,  but  a large  number  of  transform  coefficients  are 
needed  to  account  for  90%  of  the  total  a.c.  energy. 

Adaptively  increasing  the  number  of  coding  bits  allowed  for  a trouble- 
some image  block  can  lead  to  other  problems  in  a photo  reconnaissance  con- 
text. The  original  of  the  "boat  in  the  water"  block  used  above  appears  at 
the  left  of  Fig.  11-13.  After  the  coefficient  quantization  inherent  in  our 
zonal  coding  method  is  applied,  and  the  result  is  inverse  transformed,  we 
obtain  the  block  at  the  right  of  Fig.  11-13.  Quantization  errors,  as  ex- 
pected, have  contributed  to  a heavy  graininess  in  the  water  field  surround- 
ing the  boat.  Aesthetically  this  is  not  pleasing,  but  from  an  information- 
content  point  of  view  the  boat  "target"  has  been  well  represented.  In  fact, 
the  quantization  has  caused  its  accentuation:  The  original  boat  intensity  of 
101  has  been  raised  to  142.  However  potential  false  targets  have  been  in- 
troduced. Partially  surrounding  the  boat,  at  directions  of  3,  9,  and  12 
o'clock,  are  three  very  dark  pixels  reminiscent  of  the  "precursory  un- 
dershoot" exhibited  in  the  Fourier  Gibbs  phenomenon.  These  have  been 
lowered  in  intensity  from  72  to  44.  They  now  deviate  from  the  background 
intensity  by  an  amount  equal  to  the  deviation  of  the  original  boat  intensi- 
ty. Aesthetically  these  dark  pixels  are  simply  a part  of  the  overall  objec- 
tionable graininess.  From  a reconnaissance  viewpoint,  however,  their  con- 
trast as  well  as  their  symmetric  placement  about  the  boat  could  cause  them 
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to  appear  as  false  targets:  a problem  almost  as  severe  as  the  loss  of  a po- 
tential target  discussed  in  connection  with  Fig.  11-11. 

A second  problem  arises  when  an  adaptive  coder  assigns  a significantly 
larger  number  of  bits  to  a few  blocks  in  an  otherwise  low  frequency  group  of 
blocks.  At  the  left  of  Fig.  11-14,  a water  region,  with  two  potential  tar- 
gets lying  in  distinct  blocks,  is  shown.  At  the  right  we  see  the  result  of 
coding  followed  by  decoding  with  a larger  number  of  bits  set  aside  for  the 
two  target  blocks.  The  juxtaposition  of  two  or  more  blocks  of  greatly 
differing  graininess  causes  an  aesthetic  defect  that  disturbs  even  some  pro- 
fessional photo  analysts.  The  creation  of  artifacts  that  are  potential 
false  targets  is  certainly  a gitimate  issue.  However  when  an  entire  water 
region  appears  grainy,  the  coding  method  causing  this  can  actually  be  rated 
higher  than  another  method  that  supresses  grainularity  except  in  blocks  of 
high  detail.  Thus  not  all  photo  analysts  are  unconcerned  with  aesthetics, 
and  this  fact,  possibly  above  all  others,  makes  coding  for  photo  reconnais- 
sance specially  difficult  at  higher  compression  rates. 

A substantial  portion  of  the  graininess  mentioned  above  is  due  to  the 
large  number  of  one  bit  (two  level)  quantizers  prescribed  for  the  higher 
frequency  transform  coefficients  by  the  bit  assignment  algorithm.  These 
coefficients  have  been  provided  no  zero  level  and  so  any  unnecessary  coeffi- 
cient, no  matter  how  small,  is  forced  to  take  on  a full-scale  (positive  or 
negative)  value  equal  to  all  others  of  the  same  bit-ler^th.  We  have  there- 
fore, tested  the  efficacy  of  forcing  all  assigned  two  level  coefficients  to 
three  levels.  The  result  of  this  coding  strategy  change  on  "water"  blocks 
is  shown  in  Fig.  11-15.  Using  strictly  two-level  quantizers  on  the  higher 
frequency  coefficients  of  the  original  (shown  at  the  lower  left)  produces 
the  graininess  apparent  in  the  decoded  block  at  the  upper  left.  Then  in- 
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creasing  all  these  two-level  quantizers  to  three-level  ones  (and  adjusting 
the  bit  assignment  algorithm  parameters  to  obtain  the  same  overall  compres- 
sion), results  in  the  much  more  satisfactory  rendering  of  the  upper  right. 
Also,  the  mean-square  error  (m.s.e.)  has  dropped  from  26. 5 to  T7.2. 

Testing  this  same  three-level  strategy  on  blocks  having  three-pixel 
boats  yields  the  results  of  Fig.  11-16.  Again  the  graininess  is  improved, 
and  the  m.s.e.  is  reduced  from  39.3  to  24.3.  With  the  large  drop  in  graini- 
ness also  comes  a significant  decrease  in  the  probability  of  obtaining  a 
false  target. 

If  an  isolated  target  block  is  similarly  coded,  however.  Fig.  11-17 
shows  that  the  target  can  almost  completely  disappear.  At  higher  compres- 
sions, this  disappearance  is  almost  certain  to  occur.  (In  all  these  three- 
level  tests  we  have  used  a compression  rate  of  1.72-1.77  bits/pixel).  Some- 
what surprisingly,  though  the  graininess  has  again  improved,  the  m.s.e.  has 
increased  from  66.3  to  84.7. 

Use  of  the  three-level  strategy  on  "city"  blocks  renders  these  also 
more  pleasing  to  the  eye  by  reducing  artifact  clutter.  The  m.s.e.  is  not 
always  improved,  however.  At  1.72  bits/pixel,  the  two-level  coder  applied 
to  a block  positioned  over  the  right  sections  of  two  horizontal  run-ways 
(see  Fig.  V-1),  produces  the  block  images  of  Fig.  11-18.  The  m.s.e  here  has 
risen  from  44.6  to  49.0.  The  same  trend  is  in  evidence  at  a higher  compres- 
sion rate  of  0.59  bits/pixel  (see  Fig.  11-19).  The  m.s.e.  of  170.2  has  be- 
come 198.3,  and  block  boundaries  have  begun  to  appear. 

We  have  concluded  that  three-level  quantizers  applied  to  higher  fre- 
quency transform  coefficients  usually  result  in  increased  aesthetic  appeal 
(at  least  at  moderate  compressions/,  but  the  possible  loss  of  potential  tar- 
gets makes  their  use  unjustifiable  in  reconnaissance  work. 
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At  0.59  bits/pixel,  the  three-level  quantization  scheme  may  not 
far  as  well.  The  n.s.e.  again  increases,  and  block  boundaries 
are  accentuated. 


Transmission  errors  in  transform  coded  information  is  conventionally 
thought  to  be  preferable  to  the  "salt  and  pepper"  errors  accompanying  the 
use  of  some  spatial  domain  coders  such  as  PCM.  We  do  not  believe  this  issue 
is  at  all  clear  cut  in  the  case  of  aerial  reconnaissance  photography.  As 
for  frequency  domain  information,  the  toggling  of  a code  bit  may  or  may  not 
cause  large  decoding  errors.  But  the  results  of  channel  errors  applied  to 
some  spatial  domain  coders  will  often  be  highly  localized  ("salt  and 
pepper"),  and  so  frequently  the  resulting  artifacts  can  be  ruled  out  as  tar- 
gets by  contextual  considerations.  In  the  last  analysis,  these  aesthetical- 
ly annoying  defects  would  then  be  of  no  importance.  But  toggling  bits  of 
higher  significance  in  transform  coefficient  information  can  soften  targe 
edges  throughout  a block  thus  impeding  target  identification  and  even  detec- 
tion. Since  each  coefficient  affects  every  pixel  value  within  an  inverse 
transformed  block,  an  extreme  error  situation  can  result  in  the  obliteration 
of  several  targets  within  a block.  This  latter  is  not  common,  however. 

III.  Hybrid  Coding 

This  project  was  designed  to  investigate  the  use  of  a hybrid  method  in 
the  transform  coding  of  aerial  reconnaissance  images. 

The  hybrid  technique  that  was  investigated  was  introduced  by  Habibi 
C73.  It  consists  of  taking  a one  dimensional  unitary  transform  of  a strip 
of  data  and  passing  it  into  a bank  of  differential  pu^se  code  modulators, 
DPCM.  The  resulting  differential  signal  is  then  quantized  and  passed 
through  a simulation  of  a noisy  channel.  At  the  receiver  the  data  is  then 
fed  into  an  associated  DPCM  system,  and  the  inverse  transform  is  taken. 

There  were  four  main  areas  that  were  investigated  concerning  the  imple- 
mentation of  the  method: 

1.  Transform  Method 
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2.  Optimum  Strip  Length 

3.  Quantization  Method 
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4.  OPCM  System  Configuration 

111.1.  Transform  Method 

There  are  many  transform  techniques  in  use  which  are  available  for  this 
method.  Among  these  are:  Karhunun-Loeve,  (KL),  Fourier,  Walsh,  and  cosine. 
These  are  all  discussed  in  Ahmed  and  Rao  C8D . The  KL  transform  is  recog- 
nized as  being  the  best  in  the  MSE  sense  when  fewer  than  the  full  set  of 
basis  functions  are  used,  as  is  necessary  in  image  compression.  However  it 
is  rarely  used,  as  no  fast  algorithm  for  its  computation  has  been  developed. 

The  cosine  transform  was  introduced  by  Ahmed,  Natarajan,  and  Rao  C3]. 
They  show  that  its  performance  is  a close  approximation  of  the  <L  transform. 
The  hybrid  method  implemented  here  uses  the  discrete  cosine  transform. 
Chen,  Smith,  and  Fralick  C4],  have  published  a fast  algorithm  for  computing 
the  cosine  transform  that  does  not  require  the  use  of  a fast  Fourier 
transform.  This  algorithm  was  used  to  further  reduce  the  computational  cost 
of  the  hybrid  method. 

111. 2.  Strip  Length 

When  dealing  with  large  images  it  would  be  ideal  to  be  able  to  work 
with  the  whole  image  at  one  time.  This  would  take  advantage  of  all  of  the 
redundancy  available  in  the  image.  However  the  amount  of  computational  time 
and  computer  storage  space  that  this  would  require  would  be  prohibitive. 
Therefore  the  picture  must  be  broken  up  into  smaller  subunits  which  are  pro- 
cessed separately.  In  implementing  this  method  a decision  had  to  be  made  as 
to  the  desired  length  of  the  strips  over  which  the  one  dimensional 
transforms  would  be  made.  Figure  I I 1—1  presents  the  relationship  between 
the  strip  length  and  the  operations  count  required  to  compute  the  fast  DCT. 
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This  is  a significant  portion  of  the  total  number  of  computation  required 
for  the  method  and  it  is  desired  to  keep  this  as  low  as  possible  in  order 
that  this  method  might  be  realizable  in  a "real  time"  system  with  a dedicat- 
ed processor. 

The  method  was  implemented  using  strips  of  length  ranging  from  four  to 
32  pixels  long.  The  resultant  MSE  performance  as  a function  of  the  strip 
size  used  is  graphed  in  figures  1 1 1—2  and  III —3 . As  can  be  seen  there  is  no 
appreciable  increase  in  MSE  performance  for  strips  with  length  greater  that 
eight  pixels,  at  1.5  bits/pixels.  However,  at  0.5  bits/pixels,  the  MSE  of 
images  processed  with  strips  eight  pixels  long  is  significantly  larger  than 
those  with  strips  sixteen  pixels  long.  There  is  also  significant  improve- 
ment in  the  visual  characteristics  of  images  processed  with  strips  sixteen 
pixels  long,  versus  those  processed  with  strip  lengths  of  eight  pixels.  No 
similar  improvement,  when  compared  with  the  increased  computational  cost,  is 
noticed  when  the  strip  size  is  increased  to  32  pixels. 

III. 3.  Quantization  Methods 

The  images  that  were  processed  were  originally  quantized  to  256 
levels,  or  eight  bits/pixel.  In  order  to  achieve  the  desired  bandwidth 
reduction  the  transmitted  image  was  required  to  be  sent  at  an  average  rate 
which  varied  between  1.6  and  0.5  bits/pixel,  depending  on  the  degree  of 
compression  desired. 

The  optimal  bit  assignments  for  the  method  depends  on  the  type  of 
image  being  processed.  Images  which  change  rapidly  in  the  spatial  domain 
contain  a large  amount  of  high  frequency  information.  It  is  therefore 
desirable  to  assign  a relatively  larger  number  of  bits  to  the  higher  coeffi- 
cients  for  such  an  image,  than  an  image  which  changes  slowly  in  the  spatial 
domain.  In  order  to  determine  the  optimum  bit  assignments  it  is  first 


necessary  to  determine  the  statistics  of  the  image  that  is  being  processed. 
The  mean  and  the  variance  of  the  differential  coefficients  are  calculated 
using  the  standard  formulas  C93 : 
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Where  a.^  is  the  j differential  coefficient  in  the  i strip,  m is  the  to- 
tal number  of  strips,  rL  is  the  mean  of  the  differential  coefficient, 
and  is  its  variance.  With  the  exception  of  the  dc  differential  coeffi- 
cient, the  mean  values  are  approximately  zero.  The  square  root  of  the  vari- 
ance is  taken  to  determine  the  standard  deviation,  6^  . 

The  bit  asignments  are  then  made  on  the  basis  of: 


NBITSj  = CL0G2(Cj /a)+1J  x AVEBITS 

Where  "avebits"  is  the  desired  average  number  of  bits/pixel,  "o  is  the 
average  standard  deviation,  and  "obits/'  is  the  number  of  bits  assigned  to 
the  j ^ differential  coefficient. 

At  high  compressions  it  has  been  discovered  that  the  above  bit 
assignment  rule  must  be  modified  somewhat.  After  the  bit  assignments  are 
made,  the  dc  differential  coefficient  must  be  checked  to  ensure  that  at 
least  three  bits  have  been  assigned  to  it.  If  not,  bits  must  be  taken  from 
the  higher  coefficients  and  given  to  dc.  This  is  to  ensure  that  no  "blocki- 
ness"  appears  in  the  compressed  image. 

The  differential  coefficients  can  best  be  modeled  by  Lapacian 
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random  variables  (with  a two  sided  exponential  density  function).  As 
described  by  Max  Cl 03 , the  noise  introduced  by  the  quantizer  may  be  minim- 
ized in  the  MSE  sense  by  using  non-uniform  quantization  steps.  These  cutoff 
levels  were  determined  using  the  iterative  technique  that  Max  described. 

The  first  line  must  be  sent  via  a pulse  code  modulation  technique.  It 
was  assigned  eight  bits  per  coefficient,  due  to  the  greater  amount  of  infor- 


mation which  it  contains. 


III. 4.  DPCM  System  Configuration 


One  of  the  design  constraints  for  the  system  was  that  it  was  to  perform 
well  in  the  presence  of  simulated  channel  noise.  There  are  two  possible 
methods  for  the  error  protection  of  the  DPCM  system:  updating  and  leak. 

In  the  updating  method  the  original  signal,  rather  than  the  dif- 
ferential one,  is  periodically  transmitted  at  a much  higher  bit  rate.  Thus 
the  quantizer  is  periodically  restarted,  which  would  prevent  a channel  error 
from  propagating  through  the  entire  image.  This  method  was  implemented  by 
sending  the  original  spatial  information  every  32nd  line,  quantized  to  six 
bits/pixel.  In  the  resulting  image  the  channel  errors  appearea  as  black  and 
white  streaks  running  down  a column  of  the  image.  At  low  bit  rates,  0.5 
bits/pixel,  the  updating  lines  were  much  sharper,  and  thus  distracting.  Al- 
so at  low  bit  rates  the  cost  of  such  updating,  as  measured  in  the  decrease 
in  the  available  bits  for  the  differential  lines,  becomes  prohibitive.  S ne 
improvement  was  noticed  when  the  cosine  transform  coefficients,  rather  than 
the  grey  levels,  of  the  updating  lines  were  transmitted.  This  served  to 
smooth  the  error  out  over  a larger  region  of  the  image,  but  the  cost  is 
still  prohibitive. 

The  leak  method  can  be  designed  in  terms  of  the  optimal  MSE  pred- 
ictor. It  is  desired  to  determine  the  best  MSE  estimate  of  the  signal,  s^. 
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in  terms  of  the  previous  value  of  the  signal,  s.  .. 

1 ~ ■ , J 

The  signal  to  be  transmitted  will  then  be  the  difference  between  this 

estimate  and  the  actual  signal.  The  estimate,  of  s..  is: 

ij  ij 

si  j = a. s . „ 

J i-1, J 

We  desire  E{(s„-s„)^>  to  be  minimum.  Therefore  we  wish  to  minimize: 

E-C(s..-a.s.  , .)(s..-a.s.  . ..} 

ij  j i-i, j ij  j i-i, jr 

Differentiating  with  respect  to  a.  and  equating  to  zero: 

-2ECS..S.  . . >+2a  ■ E-Cs . , . > = 0 
IJ  i-1/J  J 1-1, J2 
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J 1-1, J 


ij  i-1, j' 


. th 


a.  = ECs.  .s.  , .}/ECs.  , . > 
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Using  the  following  estimates  for  the  autocorrelation,  (0,0) , of  the 

jL"  cosine  transform  coefficients,  and  the  crosscorrelation,  R.(0,1), 

J 

between  that  coefficient  and  the  corresponding  one  in  the  previous  line: 
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Figure  III —4  presents  a typical  transmitter-recei ver  pair  for  this 
method.  This  method  would  require  a parallel  set  of  these  systems,  equal  in 
number  to  the  strip  length.  Notice  that  the  mean  value  of  the  differential 
information  must  be  taken  into  consideration  for  the  dc  coefficients.  This 
is  due  to  the  high  degree  of  correlation  which  is  present  between  the  dc 
values.  The  optimal  noise-free  value  of  would  be  approximately  0.98. 
However  if  this  value  was  used,  it  would  not  allow  for  any  channel  errors  to 
"leak  out"  at  a fast  enough  rate.  Therefore  the  maximum  value  which  is  al- 
lowed for  any  a is  0.90.  For  some  of  the  higher  frequency  coefficients,  the 
crosscorrelation  is  negative.  When  this  occurs,  the  coefficients  are  as- 
sumed to  be  uncorrelated,  and  a is  set  equal  to  zero.  Figure  I I 1—5 
presents  typical  values  for  the  statistics  of  an  image  used  for  aerial 
reconnaissance. 

The  performance  of  the  hybrid  method  was  greatly  improved  using  this 
DPCM  system.  The  resultant  images  were  much  sharper,  and  the  method  was 
less  sensitive  to  channel  noise.  Figure  III —6  shows  the  performance  of  this 
method  in  the  presence  of  channel  noise.  Images  suitable  for  reconnaissance 
analysis  are  obtainable  with  noise  probability  up  to  10  ^ , at  1.5 
bits/pixel . 
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III. 5.  Summary 

This  project  was  designed  to  investigate  and  implement  a hybrid  method 
for  image  transform  coding.  The  results  show  that  a hybrid  method,  using  a 
one  dimensional  discrete  cosine  transform,  and  DPCM  between  adjacent  lines, 
can  produce  results  that  are  comparable  to  other  available  techniques.  The 
advantages  of  the  method  are  its  speed  of  operation,  and  its  relative  ease 
of  implementation.  The  hybrid  system  produces  images  ofthe  quality  needed 
for  reconnaissance  photo  analysis  at  compressions  down  to  1.5  bits/pixel. 
The  system  also  performed  extremely  well  in  the  presence  of  random  channel 
noise.  This  method  could  also  be  used  in  other  applications,  such  as  digi- 
tal television,  were  a rapid  method  of  image  compression  is  needed. 

IV.  Block  Truncation  Coding 

The  use  of  Block  Truncation  Coding  (BTC)  relative  to  this  work  was 
first  presented  in  an  earlier  report  [1]  and  will  be  reviewed  briefly  here. 
Block  Truncation  Coding  can  be  formulated  as  the  application  of  a non- 
parametric  (one-bit)  moment  preserving  quantization.  The  BTC  algorithm  ori- 
ginally presented  preserved  only  the  first  two  sample  moments  in  each  4x4 
image  block.  The  threshold  was  chosen  a priori  as  the  sample  mean.  In  this 
section,  we  will  present  our  latest  modifications  of  BTC.  These  modifica- 
tions along  with  a thorough  description  of  the  basic  algorithm  can  be  found 
in  [11].  A theoretical  development  is  presented  in  [12].  Before  presenting 
the  modifications  to  BTC  we  will  present  two  other  nonparametric  one  bit 
quantization  schemes  using  classical  fidelity  criteria  that  we  used  to  com- 
pare with  BTC  along  with  a brief  review  of  BTC.  Let  m=n^  and  let 
be  the  values  of  the  pixels  in  a block  of  the  original  picture. 
(n=4  for  our  case) 
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Let 


— 1 

X = — J'  X . be  the  first  sample  moment 
m i 


2 1 m 2 

X = — £ X;  be  the  second  sample  moment 
m i=1  1 


2 2—2 

o = X - X be  the  sample  variance 

As  with  the  design  of  any  one  bit  quantizer  (see  Figure  IV-1), 
necessary  to  find  a threshold  and  two  output  levels  for  the  quanti 
that : 

if  X ^ ^ X h output  = b 


if  X.  < X 

i 


th 


output  = a 
for  i 1,....,m. 


where 

X. , is  the  threshold 
tn 

a and  b are  the  "low"  and  "high"  output  levels  respectively. 


For  our  basic  BTC  quantizer  we  shall  make  an  ad  hoc  assumption  that  X 
This  seems  reasonable;  however,  we  will  later  modify  this  assumptio 
a more  consistent  result.  The  output  levels  a and  b for  a two-level 
preserving  quantizer  are  found  by  solving  the  following  equations: 

let  q = number  of  X^'s  greater  than  Xth  (X  in  this  case) 
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Figure IV- 1 In  designing  a one  bit  quantizer  given  the  data  values  (assumed  to 


be  a continuous  density  function),  one  must  find  a threshold  value 

X. , and  two  output  levels  a and  b,  respectively, 
th 
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We  then  have 

* 

t mX  = (m-q)a  + qb  ( I V— 3 > 


mX^  = (m-q)a^  + 


qb 


Equation  (IV-3)  is  readily  solved  for  a and  b: 
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Tne  image  is  coded  by  transmitting  X,  o and  an  nxn  bit  plane  consisting  of 
1 ' s and  0's  depending  on  whether  a given  pixel  is  above  or  below  Xth>  As- 
suming X and  "a  are  assigned  8 bits,  this  results  in  an  image  representation 
of  2 bits/pixel.  The  receiver  reconstructs  the  image  block  by  calculating  a 
and  b from  Equation  IV-4  and  placing  those  values  in  accordance  with  the 
bits  in  the  bit  plane. 

Other  techniques  could  be  used  to  design  a one-bit  quantizer.  Other 
fidelity  criteria  have  been  used  in  quantizers,  particularly  the  mean  square 
error  (MSE)  and  mean  absolute  error  (MAE).  BTC  uses  a fidelity  criterion  of 
moment  preservation  (MP).  To  use  the  MSE  fidelity  criterion,  one  proceeds 
by  first  constructing  a histogram  of  the  X^'s  (i.e.,  sorting  the  X_.'s).  Let 
Y,j , Y^,  ...  Ym  be  the  sorted  X^'s;  i.e.,  Y^  < Y^...  <_  Y^.  Again  let  q be 
the  number  of  X^'s  greater  than  Xth-  Then  a and  b are  found  by  minimizing: 


L 


m-q-1 

E 

i=i 


? Ml  - 

a)  + £ (Y.  - b; 

i=m-q 


(IV-5) 


where 


. m-q-1 

a = Y' 

m-q 

i = 1 


b = - £ Y. 

q i 

i=m-q 


In  general  it  is  impossible  to  solve  this  equation  in  closed  form  for  X , , 

th 

a,  and  b.  One  way  to  solve  this  problem  is  to  try  every  possible  threshold 

(there  are  at  most  m-1  thresholds)  and  pick  the  one  with  smallest  As- 

MSE 

suming  a and  b have  8-bit  resolution,  this  gives  a representation  of  2 
bits/pixel . 

The  problem  of  using  the  MAE  fidelity  criterion  is  very  similar  to  the 
MSE.  The  values  a and  b are  found  by  minimizing: 


m-q-1 

E 

i=1 


m 

E 

i=m-q 


(IV-6) 


where 


a = median  of  (Y.,  Y0/  ...  Ym  .) 

i c m-q-l 


b = median  of  (Y  . . . . Y ) 
m-q  m 

Here  again  the  quantizer  is  arrived  at  by  an  exhaustive  search.  Results  us- 
ing these  quantizers  and  BTC  are  shown  in  Figure  IV-2.  The  performance  of 
BTC  is  quite  good  when  compared  to  these  standard  fidelity  criteria.  The 


Results  using  various  fidelity  criteria.  All  represen- 
tations are  2.0  bits/pixel.  Upper  left:  USE;  Upper 
right:  MAE;  Lower  left:  MR;  Lower  right:  MSE  and  also 
assuming  inage  data  uniformly  distributed  with  each  blocl: 


advantage  of  just  preserving  the  sample  moments  is  obvious  because  an  ex- 
haustive search  is  not  necessary  to  match  the  criterion. 


In  some  cases  it  is  possible  to  make  an  ad  hoc  assumption  of  the  proba- 
bility density  of  the  image  in  each  block.  Once  the  density  function  is 
known  (or  guessed),  the  quantizer  is  immediately  specified  using  either  MP, 
MSE,  or  MAE  fidelity  criteria.  The  results  obtained  using  this  procedure 
are  usually  quite  poor  when  high  resolution  imagery  is  used.  This  procedure 
can  sometimes  be  used  quite  successfully  in  television  quality  "head  and 
shoulders"  pictures.  The  results  are  poorer  and  more  coding  artifacts  can 
be  seen  in  the  reconstructed  imagery. 

One  of  the  disadvantages  of  BTC  is  that  the  compression  achieved 
corresponds  to  only  2 bits/pixel.  In  many  image  coding  schemes  it  is 
desired  to  obtain  data  rates  in  the  range  of  1 .0-1 .5  bits/pixel. 

As  mentioned  above,  it  is  necessary  to  transmit  some  overhead  informa- 
tion for  the  quantizer  in  each  block.  The  information  usually  transmitted 
is  X and  a.  One  obvious  way  of  lowering  the  number  of  bits  for  image 
representation  is  to  assign  less  than  8 bits  to  X and  a.  Experimental  evi- 
dence has  indicated  that  it  is  possible  to  code  X with  6 bits  and  o with  4 
bits.  This  allows  for  considerable  savings  and  few  perceivable  errors  upon 
reconstruction.  This  then  gives  a representation  of  1.63  bits/pixel.  Al- 
ternately a and  b (instead  of  X and  "ct)  could  be  transmitted  to  the  receiver 
and  assigned  fewer  bits.  Experimental  evidence  indicates  that  the  represen- 
tation obtained  at  the  receiver  is  better  if  X and  c are  transmitted  and  a 
and  b computed  at  the  receiver;  i.e.,  the  mean  needs  more  precision  than  the 
contrast  (standard  deviation)  for  accurate  perception. 

By  choosing  the  threshold  of  the  quantizer  at  X,  it  has  been  observed 
that  partitioning  of  the  data  leads  to  some  "unnatural"  appearance  of  the 
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data.  For  high  resolution  imagery,  this  manifested  itself  by  some  unaccept- 
able coding  artifacts.  It  would  be  better  if  somehow  the  fidelity  criterion 
allowed  for  automatic  thresholc  selection  as  does  MSE  and  MAE.  This  can  be 


arrived  at  by  forcing  the  quantizer  to  preserve  not  only  the  first  two  sam- 
ple moments  but  also  higher  moments.  A detailed  development  is  presented  in 
[113.  This  method  requires  extra  computation  at  the  transmitter,  but  the 
receiver  is  not  affected.  It  should  be  mentioned  that  this  method  of  au- 
tomatic threshold  is  far  easier  than  the  MSE  or  MAE  quantizers  discussed 
above  since  an  exhaustive  search  is  not  necessary  to  find  a,  b,  and  q. 

This  new  threshold  technique  improved  the  subtle  features  (such  as  near 
edges)  of  the  image  that  are  usually  important  in  analysis  of  aerial  photog- 
raphy imagery.  For  some  of  the  imagery  used  in  this  study  the  coding  arti- 
facts produced  using  the  sample  mean  as  the  threshold  were  such  that  the 
photo  interpreters  rated  the  images  poorer.  When  the  third  moment  preserv- 
ing technique  was  used  many  of  these  coding  artifacts  disappeared  although 
the  mean  square  error  was  not  significantly  changed. 

As  with  all  non-information  preserving  image  coding,  coding  artifacts 
are  produced  in  the  image.  It  became  apparent  very  early  in  this  study  that 
BTC  produces  artifacts  that  are  very  different  from  transform  coding.  These 
artifacts  are  usually  produced  in  regions  around  edges  and  in  low  contrast 
areas  indicated  by  a sloping  gray  level.  As  mentioned  above,  BTC  does  pro- 
duce sharp  edges.  However,  these  edges  do  have  a tendency  to  be  ragged. 
Transform  coding  usually  produces  edges  that  are  blurred  and  smooth.  The 
second  problem  in  low  contrast  regions  is  due  to  inherent  quantization  noise 
in  the  one  bit  quantizer.  Here  sloping  gray  levels  can  turn  into  false 
edges.  It  should  be  emphasized  that  these  coding  artifacts  are  problems  in 
high  resolution  aerial  reconnaissance  images  where  man-made  objects  are  im- 
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portant  (i.e.,  edges).  These  coding  artifacts  usually  a^e  not  any  problem 
in  television  quality  "head  and  shoulders"  imagery. 


One  of  the  problems  that  BTC  has  is  that  it  is  really  a one-dimensional 
quantization  technique.  In  no  way  does  BTC  exploit  the  two-dimensional  na- 
ture of  the  image  within  each  block  as  does  most  other  forms  of  image  cod- 
ing. Also  BTC  generally  has  a poor  response  near  the  spatial  frequency  of 
1/2  cycle  per  block. 

One  method  to  improve  both  of  the  problems  above  is  a hybrid  formula- 
tion. First  a highly  compressed  Cosine  transform  coded  image  is  subtracted 
from  the  original  image.  For  the  results  presented  here  the  transform  pic- 
ture was  obtained  by  taking  the  two-dimensional  Cosine  transform  over  16  x 
16  pixel  blocks.  Only  the  eight  non  d.c.  coefficients  in  the  upper  left 
section  of  each  block  were  retained.  This  corresponds  to  a zonal  filtering 
method.  This  lead  to  a representation  of  0.25  bits/pixel  for  the  highly 
compressed  image.  BTC  is  then  used  on  this  difference  picture  and  the 
recombination  formed  at  the  receiver.  While  this  does  increase  the  computa- 
tional load,  the  improvement  seems  to  be  significant  enough  to  give  this 
method  further  attention.  Figure  IV-3  presents  results  of  this  hybrid 
method.  This  technique  exploits  the  edge  preservation  of  BTC  and  helps  in 
the  low  contrast  regions  of  the  image  by  improving  the  frequency  response. 
Recently  Texas  Instruments  has  done  a study  of  implementing  BTC  on  an  in- 
tegrated circuit  chip  using  VLSI  techniques.  This  study  indicates  BTC  could 
be  implemented  with  a gate  count  of  3800  and  a maximum  delay  path  of  30 
gates  Cl 61] . 
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V.  Test  Results 


V.1 . Original  Images  and  Reconst ructed  Results 

The  two  512  x 512  images  subjected  to  the  various  coding  algorithms 
described  in  this  report  are  shown  in  Figures  V-1  and  V-2.  Both  of  these 
images  were  originally  quantized  at  8 bits  per  pixel.  The  first  image, 
"AP2",  was  chosen  due  to  its  wide  variability  of  image  characteristics.  Of 
interest  are  the  city  areas,  the  airport  runways,  and  the  boats  and  other 
small  objects  in  the  water.  The  second  original,  "SAM3"  is  part  of  the 
Northeast  Test  Site  Area. 

Shown  in  Figures  V-3  through  V- 6 are  several  decoded  versions  of  the 
"SAM3"  original  when  processed  by  four  of  the  techniques  described  in  Sec- 
tions II,  III,  and  IV.  As  discussed  in  Sub-section  II. 3,  Figures  V-3  and 
V-4  are  results  obtained  from  the  same  basic  coding  algorithm.  However, 
Figure  V- 4 includes  spatial  domain  pre-processing.  Figures  V-5  and  V-6  re- 
flect results  of  applying  the  "hybrid"  and  the  "moment  preserving  block 
truncation"  technique  to  the  same  SAM3  image.  In  subjective  rankings,  the 
"hybrid"  method  was  usually  rated  better  than  the  "block  truncation"  tech- 
nique. However,  both  were  typically  rated  lower  than  the  four  class 
methods.  On  the  other  hand,  it  should  be  born  in  mind  that  both  of  these 
algorithms  represent  much  less  of  a computational  burden  than  the  zonal 
methods. 

In  Figures  V- 7 through  V-10  we  make  a comparison  between  a spatial 
domain  method,  namely  the  basic  block  truncation  algorithm  and  the  four 
class  zonal  method  based  on  spatial  domain  criteria.  See  sub-section 
II. 2. 3.  The  block  truncation  results  of  Figure  V-7  were  ranked  higher  by 
the  photoanalysts  than  the  four  class  results  of  Figure  V-8.  The  original 
image  in  both  of  these  cases  was  "AP2"  sampled  at  a size  of  512  x 512.  When 
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a small  section  of  this  image,  of  size  256  x 256,  is  blown  up  for  evalua- 


tion, the  results  of  Figures  V-9  and  V~10  are  obtained.  It  is  interesting 
to  note  that  the  expansion  of  these  smaller  images  resulted  in  a relative 
reversal  of  the  photo-interpreters  evaluation  of  these  two  techniques.  The 
basic  block  truncation  method  is  now  ranked  significantly  lower  than  the 
spatial  domain  four  class  methods. 

In  order  to  compare  the  quality  degradation  experienced  when  passing 
from  1.6  bits  per  pixel  to  a compression  rate  of  0.5  bits  per  pixel,  we  in- 
clude Figure  V-11.  This  is  the  four  class  frequency  domain  technique  dis- 
cussed in  Sub-section  II. 2. 2.  The.  image  is  once  again  SAM3.  The  effects  of 
channel  errors  on  two  of  the  techniques  studied  in  our  work  are  depicted  in 
Figures  V— 1 2 and  V-13.  The  coding  method  used  in  Figure  V-12  is  the  spatial 
domain  four  class  method.  The  hybrid  system  generated  the  noisy  channel 
performance  shown  in  Figure  V-13.  The  error  rate  used  in  both  of  these 
tests  was  10 
V.2.  Ranking  Results 

The  photo  analysts  were  supplied  a ranking  form  for  each  of  the  7 photo 
sets.  One  sample  form  appears  in  Cl,  p.983.  In  Table  V-2  appear  the  rank- 
ing results.  Abbreviations  used  in  this  table  are  defined  in  Table  V-1 . 
Notice  that  each  of  the  seven  sets  was  ranked  by  five  analysts.  In  each 
case  a ranking  of  1 denoted  the  best  (in  that  analyst's  opinion)  reproduc- 
tion of  the  original.  Space  was  provided  on  the  form  for  additional  com- 
ments such  as  excellent  (EX)  and  unacceptable  (X).  The  presence  of  either 
of  these  two  comments  appears  in  the  raw  data  of  Table  V-2  as  a superscript 
to  the  corresponding  ranking  number.  The  right  three  columns  of  each  set 
list  the  "averaging  ranking",  mean  square,  and  mean  absolute  reconstruction 
error  for  each  method.  The  average  ranking  was  calculated  by  dropping  the 
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4ZCS 

4ZFC 

4ZFCR 

4ZSF 

4Z5V 

HYB 

BTC 

BTCMP 

BTCH 

BTCMS 

BTCMA 


four  class  zone  method  first  presented  by  Chen  and  Smith 
[2] . 

four  class  zone  method  using  feature  clustering  described  in 
Sec.  II. 2. 2. 

same  as  4ZFC  but  with  pre  and  post  processing  as  described 
in  Sec.  II. 3. 

four  class  zone  using  spatial  criteria  and  equal  size 
classes  described  in  Sec.  II. 2. 3. 

four  class  zone  using  spatial  criteria  and  variable  size 
classes  described  in  Sec.  II. 2. 3. 
hybrid  as  described  in  Sec.  III. 

basic  block  truncation  algorithm  as  described  in  Sec.  IV. 

BTC  with  moment  preserving  threshold  as  discussed  in  Sec. 

IV. 

BTC  with  hybrid  addition  (1.8  bits/pixel  total)  as  discussed 
in  Sec.  IV. 

BTC  with  minimum  mse  (2.0  bits/pixel)  as  discussed  in  Sec. 
IV. 

BTC  with  minimum  mae  (2.0  bits/pixel)  as  discussed  in  Sec. 
IV. 


Abbreviations  used  in  ranking  results  presented  in 

Table  V-2. 


TABLE  V-1 


TABLE  V-2 


"Raw  data"  as  taken  from  analysts 
average  ranking  and  the  computed 
for  that  method.  The  average  rank 
and  lowest  rankings  and  averaging 
breviations  are  described  in  Table 


forms.  Also  shown  for  each  method  are  an 
mean-square  error  and  mean  absolute  error 
was  calculated  by  dropping  the  highest 
the  remaining  three.  The  method  name  ab- 

V-1 . 


(cont.  on  next  page) 


SAM3,  1 . 

6 bits/pixel. 

512x512, 

no  errors 

Analyst 

AVG 

Method 

Photo 

No.  #1_ 

#2 

#3 

#4 

#2 

RANK 

MSE 

MAE 

4ZFCP 

562 

2 

1 

1EX 

2EX 

7 

1.7 

31 .60 

3.80 

4ZSV 

512 

5 

2 

3 

1 EX 

2EX 

2.3 

31.55 

3.89 

4ZFC 

549 

1 EX 

5 

2 

3EX 

6 

3.3 

28.71 

3.74 

4ZSF 

587 

3 

6 

4 

8 

5 

5.0 

34.26 

3.91 

BTCH 

552 

6 

7 

5 

6 

3EX 

5.7 

48.43 

4.53 

BTCMP 

501 

7 

4 

8 

7 

1EX 

6.0 

50.13 

4.69 

4ZCS 

517 

4 

8 

6 

5EX 

8 

6.3 

42.79 

4.43 

HYB 

514 

9 

9X 

9X 

m 

X 

4 

7.3 

94.37 

5.10 

BTC 

576 

8 

3 

7 

9 

9 

8.0 

53.56 

4.70 

Correlation  Coefficients:  Avg. 

rank  - 

mse  p 

= 0.71 

Avg. 

rank  - 

mae  p 

= 0.87 

I 

SAM3  , 1 

.6  bits/pixel 

, 256x256, 

no  errors 

4ZFC 

453 

ex 

2 

1 

1EX 

2 

3 

1.7 

4ZFCP 

451 

1 

2 

3 

2EX 

6 

2.0 

4Z5V 

457 

4 

5 

2 

3 

1EX 

3.0 

4ZCS 

454 

3 

3 

4 

5 

4 

3.7 

4ZSF 

456 

5 

4 

5 

4 

2 

4.3 

HYB 

452 

6 

6X 

6X 

6X 

6X 

6.0 

SAM3  , 1 
ex 

.6  bits/pixel 

, 256x256, 

no  errors 

BTCH 

459 

1x 

2 

1 

1 

1 

1.0 

BTCMA 

462 

2X 

1 

4 

3 

2 

2.3 

BTCMS 

461 

3X 

5 

2 

2 

3 

2.7 

1 

BTCMP 

458 

4X 

4 

3 

5 

4 

4.0 

BTC 

460 

5X 

3 

5X 

4 

5 

4.7 

*-'■  ^r--. 
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Table  V- 2 (cont.) 


SAM3, 

1.6 

bits/pixel. 

Analyst 

512x512, 

10  ^ error  prob 

AVG 

• 

Method 

Photo  No. 

#1 

U2 

#3 

#4 

#5 

RANK 

MSE 

MAE 

4ZSF 

523 

1 

2 

1 

1 

1EX 

1 .0 

45.67 

4.27 

4ZCS 

568 

2 

1 

4 

3 

2EX 

2.3 

75.40 

4.99 

4ZFC 

503 

3 

4 

2 

2 

6 

3.0 

44.17 

4.20 

HYB 

590 

6 

3 

3 

6 

3 

4.0 

103.06 

5.47 

4ZSV 

539 

4 

5X 

5X 

4 

5 

4.7 

65.54 

4.47 

BTCP 

573 

5 

6X 

6X 

5 

4 

5.3 

93.11 

5.14 

Correlation  Coefficients:  Avg.  rank  - mse  p = 0.62 

Avg.  rank  - mae  p = 0.46 


SAM3,  0.5  bits/pixel,  512x512,  no  errors 


4ZFC 

538 

2 

1 

5 

1 

1EX  1.3 

81 .61 

6.48 

4ZFCP 

557 

1 

5 

1EX 

2 

2 1.7 

86.21 

6.35 

4ZSF 

592 

3 

3 

3 

4 

4 3.3 

94.60 

6.89 

4ZCS 

542 

4 

4 

2 

5 

3 3.7 

107.70 

7.37 

4ZSV 

554 

5 

2 

4 

3 

5 4.0 

96.24 

6.96 

HYB 

577 

6X 

6X 

6X 

6X 

6X  6.0 

228.18 

8.74 

Correlation 

Coefficients 

: Avg 

. rank  - mse 

P = 0 

.85 

Avg 

. rank  - mae 

P = 0 

.93 

! 
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Table  V-2  (cont.) 

AP2,  1.6  bits/pixel,  512x512,  no  errors 


Method 

Photo  No. 

nl 

#2 

Analyst 

#3 

#4 

U5 

AVG 

RANK 

MSE 

MAE 

BTC 

631 

1 EX 

5 

6 

3EX 

1EX 

3.0 

76.66 

5.70 

4ZFCP 

672 

2EX 

3 

4 

2ex 

6 

3.0 

30.81 

3.89 

4ZSF 

643 

6EX 

4 

1 

1EX 

5 

3.3 

24.74 

3.58 

HYB 

636 

U-J 

m 

X 

6 

3 

m 

X 

OJ 

m 

X 

3.3 

47.76 

4.82 

BTCMP 

627 

5EX 

1 

7 

6 

2EX 

4.3 

78.14 

5.74 

4ZFC 

615 

m 

X 

7 

2 

5 

4 

4.3 

25.20 

3.58 

4ZSV 

606 

7EX 

2 

5 

7 

7 

6.3 

36.75 

4.44 

Correlation  Coefficients:  Avg.  rank  - mse  P = -0.12 

Avg.  rank  - mae  P = -0.03 


AP2e)(,  1.6  bits/pixel,  256x256,  no  errors 


4ZFCP 

402 

1EX 

2 

1EX 

4 

2EX 

1.7 

4ZSF 

404 

3 

1 

2 

3EX 

4 

2.7 

4ZSV 

405 

2 

4 

3 

5 

1EX 

3.0 

4ZFC 

403 

4 

3 

4 

r\j 

m 

X 

OJ 

m 

X 

3.3 

HYB 

401 

5 

7X 

5 

iEX 

5 

5.0 

BTCMP 

406 

6X 

5 

6X 

6 

6 

6.0 

BTC 

407 

7X 

6 

7X 

7 

7 

7.0 
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highest  and  lowest  ranking  given  each  method  and  averaging  the  remaining 
three. 

Table  V-2  also  lists  correlation  coefficients  relating  the 
interpreter's  average  ranking  of  a reconstructed  image  to  its  calculated 
m.s.e.  and  m.a.e.  (mean  absolute  error).  These  were  calculated  as 


' oo 
x y 

where  is  the  sample  standard  deviation  of  the  average  interpreter  rank- 
ings (across  all  reconstructed  images  within  a set),  while  is  the  sample 
standard  deviation  of  either  the  m.s.e.  or  m.a.e.  across  that  same  set.  And 

5 is  the  sample  cross  correlation.  The  sampling  bias  effects  were  not 
xy 

taken  into  consideration. 

V.3.  Discussion 

The  average  rakings  in  general  are  fairly  well  correlated  with  mean 
square  error  and  mean  absolute  error.  However,  the  variation  among  indivi- 
dual analysts'  rankings  show  the  subjectivity  of  such  a test  procedure 
and/or  similarities  of  many  of  the  reconstructed  results.  In  a few  cases, 
however,  the  subjective  rankings  differed  from  the  m.s.e.  rankings.  This  is 
most  noticeable  in  the  case  of  AP2  where  the  correlation  coefficient  is  ac- 
tually negative.  In  this  particular  set,  the  reconstructed  picture  quality 
was  high  enough  that  the  analysts  could  not  agree  on  similar  rankings. 

The  four  class  zone  methods  performed  the  best  in  terms  of  subjective 
rankings  and  mean  square  error.  At  0.5  bits  per  pixel,  they  were  the  only 
acceptable  methods  and  the  feature  clustering  method  of  bit  assignment  [Fig. 
V-11D  was  ranked  the  best  by  the  photoanalysts.  Although  some  degradation 


is  visible  in  this  image,  it  received  an  "excellent"  by  one  of  the  analysts. 


» 


At  1.6  bits  per  pixel,  all  four  class  zone  methods  performed  well.  The 
preprocessing  to  improve  performance  in  the  dark  regions  (Sec. II. 3)  resulted 
in  the  best  overall  rankings,  thus  showing  the  viability  of  such  a method  as 
applied  to  aerial  reconnaissance  imagery.  In  terms  of  computational  load 
and  performance  the  preprocessed  feature  clustering  (4ZFCP)  method  appears 
presently  to  produce  the  best  overall  results. 

The  hybrid  method  is  now  a viable  moderate  computational  alternative  to 
the  two-dimensional  transform  methods  even  at  error  rates  up  to  10  ^ when 
coded  at  1.6  bits  per  pixel.  Although  it  was  ranked  below  the  four-class 
zone  methods,  it  generally  provided  acceptable  results.  The  computations 
involved  in  the  hybrid  approach  are  significantly  less  than  those  required 
by  the  two-dimensional  transform  methods.  More  importantly,  the  number  of 
mass  storage  picture  accesses  required  by  hybrid  coding  is  two,  instead  of 
the  three  required  by  the  2-0  methods. 

The  best  performing  spatial  technique  that  we  have  tested  is  block 
truncation  coding.  The  overall  rankings  of  BTC  are  generally  below  those  of 
the  transform  techniques.  However,  under  normal  viewing  conditions  (indivi- 
dual pixels  not  visible  as  separate  entities),  BTC  provided  acceptable  and 
at  times,  excellent,  results.  This  implies  that  post  processing  at  the  re- 
ceiver would  improve  subjective  performance  when  individual  pixels  are  visi- 
ble. The  computational  savings  in  using  this  technique  are  enormous.  This 
is  the  only  real  time,  single  image  access  technique  rated  acceptable  here 
and  in  Cl, 163. 


VI.  Conclusions  and  Future  Research  Directions 
Over  the  past  1-1/2  years,  our  research  has  indicated  that  for  good 
subjective  reproduction  of  high  resolution  aerial  imagery,  compression  ra- 
tios in  the  range  of  0.5  to  1.5  bits/pixel  are  achievable  using  adaptive 
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two-dimensional  block  transform  techniques.  Random  channel  errors  of  10 
are  insignificant  and  of  10  ^ are  tolerable. 

Between  1.0  and  2.0  bits/pixel  hybrid  coding  becomes  almost  competitive 
in  quality  and  offers  computational  advantages. 

Above  1.5  bits/pixel,  several  spatial  coding  possibilities  exist  which 
offer  tremendous  computational  and  storage  advantages  as  well  as  respect- 
able, if  not  excellent,  performance. 

We  feel  that  three  major  areas  of  study  should  be  investigated  to  make 
the  application  of  these  coding  methods  more  desirable  and  easily  implement- 
ed. These  are: 

(1)  Efficiency  Improvements  - The  best  performing  coding  methods  (2-D 
transforms)  are  the  most  computationally  intensive.  The  best 
method  requires  3 passes  through  the  data  to  categorize  each  16  x 
16  block,  collect  block  statistics  and  to  do  bit  assignments  and 
coding.  Upon  observing  the  bit  assignments  and  categorizations, 
we  feel  that  some  standard  bit  assignments  and  categories  could  be 
derived  and  the  2-D  trans‘orm  methods  be  made  much  more  efficient 
by  assigning  each  block  to  one  of  these  predetermined  categories 
and  sending  a short  code  to  the  receiver  for  each  indicating  the 
assigned  category.  This  would  eliminate  all  picture  storage  or 
rescanning  requirements  in  that  each  block  could  be  processed  and 
transmitted  independent  of  the  others.  This  also  reduces  the 
transmitted  overhead  because  the  bit  maps  can  be  prestored  at  the 
receiver . 

(2)  Technique  Combinations  - Coding  methods  are  usually  quite  data 
dependent  in  regard  to  subjective  performance.  For  example,  pic- 
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ture  blocks  consisting  of  one  single  bright  pixel  and  many  dark 
ones  are  most  efficiently  coded  using  a spatial  technique,  while 
more  gradual  variations  are  more  suited  for  transform  coding.  We 
have  adapted  all  of  our  techniques  to  operate  on  16  x 16  blocks  so 
that  the  methods  can  be  mixed  to  provide  an  optimum  strategy. 
This  method  is  actually  an  extension  of  the  efficiency  method  sug- 
gested in  part  (1)  where  the  various  categories  not  only  include 
various  bit  assignments  but  also  can  indicate  various  coding 
methods.  The  category  selection  would  involve  frequency  domain 
(how  much  energy  in  each  region)  and  spatial  domain  (presence  of 
spots  or  edges)  measurements. 

(3)  Source  Error  Correction  - It  has  been  mentioned  that  the  presence 
of  uncorrected  channel  errors  can  cause  significant  distortion  in 
the  received  picture.  The  need  for  bandwidth  compression  elim- 
inates the  possibility  of  using  additional  bits  for  channel  error 
detection  and  correction.  However,  it  is  possible  in  many  cases 
for  the  receiver  to  find  source  errors  and  correct  them.  A simple 
method  would  be  to  check  the  boundary  between  a block  to  be  tested 
and  its  neighbors.  A discontinuity  in  all  (or  most)  boundary 
points  indicates  a bad  block  and  the  receiver  could  then  test  the 
dc  and  other  low  frequency  coefficients  to  try  to  locate  the  er- 
ror. We  feel  that  for  several  of  our  coding  methods  implemented, 
such  a system  would  give  an  order  of  magnitude  receiver  perfor- 
mance improvement  when  operating  in  the  presence  of  channel  er- 
rors. 

In  addition  to  the  general  areas  outlined  above,  three  specific  tech- 


niques that  have  the  potential  for  development  are  now  described: 
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Mi xed  Basis  Transforms 

In  matrix  form,  an  orthogonal  transform  of  an  NxN  image  block  G can  be 
expressed  as 

7 •CG>  = utgu 

where  a superscript  T denotes  the  matrix  transpose  operation  while  each 
column  of  U holds  samples  of  a basis  function  of  the  transform  type  being 
used  (Cosine,  Fourier,  Slant,  etc.).  It  is  known  that  since  the  above  equa- 
tion can  be  expressed  as 

7<G>  = UTGU  = C(GU)T*U3T 

the  two-dimensional  transform  operation  may  be  viewed  as  a three-step  pro- 
cess: 1.)  "U-transform"  each  row  of  G separately,  replacing  that  row  with 
its  one-dimensional  transform;  2.)  transpose  this  new  matrix  of  one- 
dimensional transforms.  The  ith  row  of  the  resulting  matrix  now  contains 
all  the  ith  basis  function  weights  from  row  zero  of  the  original  G matrix 
down  through  row  N-1 . (If  the  rows  of  G are  highly  correlated,  the  rows  of 
(GU)T  will  be  slowly-varying  from  column  zero  through  column  N-1.);  3.)  "U- 

transform"  the  rows  of  (GU)T,  again  treating  these  as  separate  one- 
dimensional objects. 

There  is  no  theoretical  or  computational  reason  why  the  transforms  used 
in  steps  1.)  and  3.)  need  be  of  the  same  type  C13].  Though  it  is  rarely 
done,  the  "row"  transform  of  step  1.)  can  be  of  type  U while  the  column 
transform  of  step  3.)  can  be  of  type,  say,  V.  Thus  we  have  a mixed  (or  hy- 
brid) transform  method  expressible  as 
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does 


CG>  = VTGU 

Indeed,  the  hybrid  coding  method  reported  on  in  Sec.  Ill  (and  in  C13) 
something  like  this  by  replacing  the  V-transform  operation  with  a DPCM 
coder. 

Since  many  image  blocks  will  exhibit  the  high  correlation  (or  slowly  - 
varying  nature),  mentioned  in  connection  with  step  2.).  Above,  it  should  be 
worthwhile  to  investigate  alternate  "V-transforms"  that  might  compact  such 
slowly-varying  "row  signals"  more  effectively  than  a repeated  application  of 
U.  (In  all  our  present  cases,  U is  the  cosine  transform.)  Of  most  interest 
at  this  early  stage  would  be  the  slant  Cl 41]  transform  since  it  affords  a 
linearly  graded  basis  function;  but  others  should  also  be  tried,  as  should 
other  spatial  domain  interpolator  besides  DPCM. 

MAPS  Improvements 

Of  those  coding  algorithms  we  have  studied,  the  MAPS  C15D  approach  is 
the  most  computationally  efficient.  Hence,  we  propose  to  examine  this 
method  more  closely,  hopefully  to  improve  its  rated  performance  in  photo 
reconnaissance  work. 

The  two  fundamental  aspects  of  MAPS  coding  are  1.)  its  pixel  ordering 
sequence  which,  as  we  have  shown  in  Cl,  p.45],  allows  the  coarsely-variable 
length  MAPS  records  to  be  transmitted  at  error  rates  as  high  as  10  ^ without 
(usually)  the  loss  of  transmitter-receiver  synchronization;  and  2.)  the 
representation  of  a spatial  group  of  pixels  by  a single  number:  the  group 
intensity  mean.  It  is  this  latter  aspect  we  hope  to  improve. 

Within  any  local  area  of  an  image,  the  compression  rates  at  which  a 
MAPS  coder  may  operate  are  few  in  number  and  of  progressively  wider  spacing. 
See  Table  VI-1.  The  result  of  such  a coarse  partition  of  the  overall 
compression  range  must  frequently  give  rise  to  a "feast  or  famine"  situation 
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in  which  one  compression  rate  affords  far  more  resolution  than  desired 


while  the  next  higher  rate  yields  unacceptably  poor  performance.  However, 
if,  after  each  line  having  an  asterisk  in  Table  VI-1,  we  insert  an  addition- 
al mode  of  compression  operation,  we  obtain  the  somewhat  finer  partition  of 
Table  VI-2.  In  each  of  these  new  cases  we  have  provided  for  two  (rather 
than  one  or  four)  words  of  intensity  information:  One  6-bit  word  (as  before) 
and  one  (new)  8-bit  word.  This  latter  quantity  could  be  split  into  sub- 
fields as  suggested  below. 

Some  quantities  for  which  this  new  8-bit  field  could  be  reserved  are: 

1. )  A variance,  which,  together  with  the  mean,  could  establish  a ran- 

dom number  generator  which  would  define  pixel  intensities  within  a 
single  block  at  the  receiver.  This  should  be  effective  in  remov- 
ing the  block  edge  artifacts  currently  apparent  in  MAPS-coded  im- 
ages. (Something  like  this  was  suggested  in  Cl 5D , though  no 
separate  code  word  was  to  be  reserved  for  it.)  This  variance  quan- 
tity would  be  in  some  way  dependent  on  4.)  the  block  size,  b.)  the 
pixel  intensity  variance  within  the  block,  or  c.)  the  intensity 
variances  at  the  block  edges.  If  no  aultipli  »tions  are  desired, 
the  square  root  of  the  variance  (s.c.  ^ re:  aced  by  a mean 

absolute  value. 

i! 

2. )  The  6 and  8-bit  work  cou  j :• 
to  allow  the  transm 
c.)  coarse  slope 
Such  a plane 
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Merged 

Intensities 

Bits  Transmitted 

Effective 

Group  Size 

in  Group 

(Including  Overhead) 

Compression  Rate 

*2x2 

4 

30.0937 

7.5  bits/pixel 

2 x 2 

1 

12.0937 

3.02 

* 4x4 

4 

30.375 

1.897 

4x4 

1 

12.375 

.773 

* 8x8 

4 

31.5 

.4923 

8x8 

1 

13.5 

.2109 

* 16  x 16 

4 

36.0 

.1405 

16  x 16 

1 

18.0 

.0703 

Table  VI-1:  The  instantaneous  coding  rates  available  to  the  MAPS 

algorithm  as  implemented  at  Purdue. 


Merged 

Intensities 

Bits  Transmitted 

Effective 

Group  Size 

in  Group 

(Including  Overhead) 

Compression  Rate 

2x2 

4 

30.0937 

7.5  bits/pixel 

*2x2 

2 

20.0937 

5.02 

2x2 

1 

12.0937 

3.02 

4x4 

4 

30.375 

1.897 

* 4x4 

2 

20.375 

1.275 

4x4 

1 

12.375 

.773 

8x8 

4 

31.5 

.4923 

* 8x8 

2 

21.5 

.3360 

8x8 

1 

13.5 

.2109 

16  x 16 

4 

36.0 

.1405 

* 16  x 16 

2 

26.0 

.1016 

16  x 16 

1 

18.0 

.0703 

Table  VI-2:  Inserting  new  MAPS  coding  rate  levels  (marked  with 
an  asterisk)  results  in  a somewhat  finer  partition 
for  better  adaptive  coding. 
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3.)  The  8-bit  word  could  be  split  into  two  subfields  to  allow  the 
specification  of  a.)  an  interpolating  basis  function  aplitude,  and 
b.)  which  basis  function.  For  the  latter,  three  bits  could  select 
one  of  the  eight  low  frequency  2-0  cosine  transform  basis  func- 
tions nearest  d.c. 

BTC  Improvements 

We  have  recently  been  experimenting  with  both  pre-  and  post-processing 
of  the  image  relative  to  BTC.  The  post-processing  has  significantly  reduced 
the  m.s.e.  of  the  reconstructed  image  and  we  feel  the  subjective  performance 
will  improve.  Much  of  the  visible  error  in  BTC  is  sharp  edge  introduction 
due  to  the  one  bit  quantization.  The  post-processing  takes  the  form  of  an 
optimum  estimation  procedure  at  the  receiver  based  on  picture  and  artifact 
statistics.  The  pre-processing  we  have  done  is  that  of  selective  blurring 
to  enhance  the  threshold  selection.  We  feel  both  of  these  methods  should  be 
explored  to  improve  the  performance  of  BTC. 
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